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Prisoner's Dilemma: non-trivial results for the lowest temptation 
level in the Darwinian and Pavlovian evolutionary strategies 
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Abstract. The lowest temptation level (T = 1) is considered a trivial case for the Prisoner's Dilemma. 
Here, we show that this statement is true only for a very particular case, where the players interact with 
only one player. Otherwise, if the players interact with more than one player, the system presents the 
same possible behaviors observed for T > 1. In the steady state, the system can reach the cooperative, 
chaotic or defective phases, when adopting the Darwinian Evolutionary Strategy and the cooperative or 
guasi-regular phases, adopting the Pavlovian Evolutionary Strategy. 
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In the Prisoner's Dilemma (PD), two players confront 
themselves and each one can either cooperate (C) or de- 
fect (D). Under mutual cooperation, both players receive 
a payoff R {reward). If both are defectors, their payoff is 
P [punishment). When a player cooperates and the other 
defects, they receive S (sucker) and T {temptation), re- 
spectively p]. The dilemma appears under the conditions 
T > R > P > S and R > {S + T)/2 [lj. In these circum- 
stances, the best individual strategy is to defect, because 
it assures a higher payoff than cooperation, independently 
of the opponent decision (Nash equilibrium). However, the 
best global strategy is cooperation. 

When PD is played repeatedly, it is called Iterated 
Prisoner Dilemma (IPD). A computer tournament was 
proposed by Axelrod in the 80's [213] to compare different 
strategies playing the IPD. The winner strategy was tit- 
for-tat (TFT), where a player cooperates in the first round 
and subsequently, copies the opponent last round action. 
It has only one time step memory. Cooperation emergence 
as a stable strategy made the PD popular [4]. 

In a spatial PD game, all players play against their 
neighbors, they sum the payoffs and update their states. 
The rules to update the states depend on the adopted 
strategy. Here, we consider two deterministic strategies 
based on the total payoff: Darwinian Evolutionary Strat- 
egy (DES) [5 1 6] and Pavlovian Evolutionary Strategy (PES) 
A player adopting the DES uses the strategy of copying 
the best adapted player state (fittest player) among the 



interacting players. This behavior can be compared to the 
Darwin's natural selection principle, the "survival of the 
fittest" [H] . The PES is a win-stay, lose-shift strategy [TS] . 
Defined an aspiration level (AL) [llj . a player compares 
his/her total payoff to it. If the total payoff is greater 
than AL, the player keeps his/her current state, other- 
wise switches it. This behavior can be thought as: "never 
change a winning team". 



The main variable in the PD problem is the temptation 
payoff. Same temptation values yield a total payoff to the 
players that force they to switch their states, which cause 
a phase transition. These values of temptation are the 
critical temptation values and they depend on the adopted 
strategy, on the system connectivity and the neighborhood 
configuration [15j . Here, we only consider the particular 
case T = 1, the lowest temptation level to defect. We 
choose the following payoffs: for DES: R = 1 and P = S = 
0; and for PES: R— 1, P = -R = -1 and S = —T = -1. 
The AL for the PES is to receive a positive payoff. When 
a cooperator plays against a defector, if T = 1 and R = T, 
both have the same payoff. Due to this result, a previous 
study [16] has considered, explicitly, T — 1 as a trivial 
case (other authors do not even mention this case), since 
players do not switch their states during time evolution. 
7t & :&[i#[iilel 1 fifij .statement is true only for players using 
the DES in the case of each player interacting with only 
one neighbor. If players play with more than one neighbor, 
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they can switch their states and in fact they do. In the 
following we show that T = 1 is indeed a non-trivial case. 

Now, consider players, located in the sites of a one- 
dimensional lattice (with periodic boundary conditions). 
Each player can cooperate or defect, and play the IPD 
with z neighbors (coordination number). If z is even, the 
player interacts with z/2 players to the right-hand side 
and z/2 to the left one. If z is odd, player self -interacts 
(plays against his/her own state) and with (z — l)/2 play- 
ers on each side. The system order parameter is the asymp- 
totic proportion of cooperators, poo- This geometry has 
been studied in Ref. [15117118] and gives the same results 
for critical temptation values as for regular lattices, once it 
only depends on the connectivity of the system. It also al- 
lows to numerically explore the whole parameter space [18) 
which permitted us to detect a new phase {quasi- regular) 
for the Pavlov strategy [15] . 

The variable 9i represents the i-th player state, if 9i = 
0, the player defects and if 0i = 1, the player cooperates. 
If a player interacts with more than one player, the to- 
tal payoff (sum of all his/her payoffs per interaction) is 

given by [H]: G[ z ; c ' ) (9 l ) = T Cl + P{z - a), for 9 t = 1, 

and G\ z '' Ci) {9i) = R Cl + S(z - a), for 9 t = 0. The quan- 
tities T, R, P and S are the PD payoff values; z is the 
system connectivity; and Cj is the number of cooperators 
in the i-th player neighborhood. During time evolution, 
players can organize themselves in well defined cooper- 
ative or defective clusters, which define the borders be- 
tween them. Clusters interact among themselves, estab- 
lishing the invasion processes. In a cooperative cluster, 
inner members have a higher payoff than in the defective 
ones, and the payoff differences along borders are more 
remarkable. On one hand, for players adopting the DES, 
these payoff differences along the border region are essen- 
tial to determine these player states switching due to the 
payoffs comparison. Whereas, in large clusters, inner play- 
ers do not switch their states, since every player has the 
same state and payoff. Cooperators/defectors invade de- 
fective/cooperative clusters from the cluster border play- 
ers [17]. On the other hand, if they are adopting PES, the 
switching process can take place in any location of the 
cluster. Inner players can switch their states, once they 
do not compare their payoffs with the neighboring ones, 
but compare with their own aspiration level. If the pay- 
offs do not achieve this aspiration level, they switch their 
states [T5] . 

We have performed numerical simulations, where each 
simulation consists on distributing the players in a one- 
dimensional lattice with L agents, out of which p$L are 
randomly set as cooperators and the remaining ones as 
defectors. Then agents play the IPD with z neighbors 
and update their states according to the adopted strategy. 
This process is repeated till the system reaches a steady 
state. To calculate the asymptotic proportion of coopera- 
tors mean value and its standard deviation, we repeat the 
simulation for M = 1000 realizations, with different ini- 
tial configurations of cooperators for each parameters set. 
We present the results of the parameter space exhaustive 
exploration, Poo(T =1; po', z), and the associated stan- 



dard deviation, SD, for players adopting the DES and the 
PES. The exhaustive exploration is done using the follow- 
ing ranges and steps: < po < 1 with Apo — 0.1 steps 
and 2 < z < 30 with integer values. 

The state switching processes drive the system to a 
steady state that defines a phase [15|17|18] , In a naive 
analysis, when the system exhibits a cooperators major- 
ity (poo > 0.5), we say that the system is in the coopera- 
tive phase. If a defectors majority occurs (p^ < 0.5), we 
have the defective phase. Also, a chaotic phase can occur, 
characterized by a high sensitivity to small change in the 
initial configuration, leading to large Poo fluctuations. As 
it is shown in the Ref. |15] . in the guasi-regular phase, 
the proportion of cooperators oscillates around poo ~ 0.5, 
with many players switching their states at each time step, 
even in the steady state. In cooperative, defective and 
quasi-regular phases, the poo fluctuations (standard de- 
viation) are very small, contrasting to the chaotic ones. 
For a system, which players adopt the DES, the coop- 
erative, defective or chaotic phases may occur; if players 
adopt the PES, the possible outcomes are: the cooperative 
or quasi-regular phases [15"] . 

To compare our results, consider T — 1 as a trivial 
case. As currently believed, in this case, all players do not 
switch their states during time evolution and the system 
keeps its initial configuration (poo — po) for every param- 
eters set. For T = 1, the poo surface, as function of po and 
z, is a plan depicted in Fig. [1] 




Fig. 1: If T = 1 was a trivial case, the poo surface, as 
function of pa and z, is not altered from the initial config- 
uration due time evolution. 



The Poo{T = 1; po', z) surfaces, for DES, and its stan- 
dard deviation are depicted in Fig. [2] Figs. [2a] (without 
self-interaction, even z values) and !2bl fwith self-interaction, 
odd z values) show that players switch theirs states, be- 
cause Poo does not form the plan as the one displayed 
in Fig. [T] Players have not kept their initial states during 
system evolution. Players switch their states, because they 
compare the total payoff, not the payoff per play of each 
pair of players. Observing the systems, with and without 
self-interaction, one can notice that the cooperative phase 
is more prominent than the defective one in both cases. 
However, self-interaction increases the cooperative phase 
in comparison to the absence of self-interaction. Once the 
players self- interact, a cooperator has at least one positive 
payoff and a defector has a null payoff. In this way, self- 
interaction is advantageous to cooperator and they can 



Marcelo Alves Pereira, Alexandre Souto Martinez: Title Suppressed Due to Excessive Length 



3 



be replicated more easily (higher payoff) and cooperation 
emerges in the system. Figs. [2c] and [2d] show the graph- 
ics of the fluctuations of poo for even and odd z values, 
respectively. In these graphics, if SD ~ 0.5, the system 
presents the chaotic phase in that region of the parameter 
space. For T = 1, the chaotic phase is present only for 
even z values, without self- interaction (see Fig. [2c]). The 
chaotic phase occurs between the cooperative and defec- 
tive phases (see the cliff in Fig. [2a]) as po decreases. When 
players self-interact, cooperation increases and the chaotic 
phase does not appear at all(see Fig. [2d]) . 
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Fig. 2: Surfaces of poo and their fluctuations as a function 
of po and z, for T = 1. Players adopt the DES. Specifically, 
(a) poo for even z values, with z = [2; 30] (without self- 
interaction); (b) poo for odd z values, with z = [3; 29] 
(with self- interaction) ; (c) fluctuation of poo displayed in 
(a); (d) the same for (b). Projections of poo in the plan 
PocPo- (e) blue: 2 = 2, green: z = 4, red: z = 8, cyan: 
z = 16, magenta: z = 28, yellow: z = 30, black: result if 
players did not change their states, (f) blue: z = 3, green: 
z = 5, red: z = 9, cyan: z — 19, magenta: z — 27, yellow: 
z = 29, black color: trivial case. 

The projection of Poo(lj Po; z ) on the plan pooPo rep- 
resents poo as a function of po for different z values. Fig.[2el 
displays these projections for even z values (without self- 
interaction) and Fig.[2fJ the projections for odd ones (with 
self-interaction). In Fig. [2f] the cooperation emerges for 



Po > 0.1, due the self-interaction presence, which increases 
the cooperators payoff, as explained above. In Fig. (2e] poo 
values can be greater or lower than the "expected" value: 
Poo = Po- For Po < 0.4, poo < po and SD ~ 0.5. On one 
hand, if the cooperator proportion is small (po ~ 0), these 
cooperators receive several null payoffs (due to the interac- 
tions with defectors) and their payoffs decrease, while the 
defector payoffs increase, consequently cooperators switch 
their states and cooperation does not emerge. On the other 
hand, when po > 0.4 more cooperators exist in the sys- 
tem in the beginning of the time evolution. Cooperators 
playing against themselves receive a positive payoff. The 
total payoff of cooperators become greater than the one 
of defectors that confronted others defectors, so these co- 
operators do not switch. But, the defectors, despite of ex- 
ploiting these cooperators, can confront others defectors, 
which lead to a decrease of their total payoff, leading them 
to switch their states, which drive the system to the defec- 
tive phase. Meanwhile, for z = 2, it occurs one exception, 
in this case the cooperation does not emerge, because co- 
operator j, that plays against cooperator i and defector 
k, has a payoff Gj = 1. If defector k interacts with other 
cooperator beyond j (remember that z = 2), he/she has 
a payoff Gk = 2, then the cooperator j copies the player 
k state. In this way, defection becomes the main behavior 
of the players, raising the defective phase. This allows us 
to conclude that higher connectivity favors cooperation in 
this case, because it increases the chance of a cooperator 
to interact with other cooperators. 

For the PES, the payoff of two defectors, when play- 
ing against themselves, is negative (P = —T = —1). This 
is enough for both to switch their states, but they still 
can explore cooperators of their neighborhoods. If even 
exploiting the neighbors, their payoffs do not become pos- 
itive, they switch their states and cooperation emerges in 
the system. The Poa{T — 1; po', z) surfaces, for PES, and 
the associated standard deviation are depicted in Fig. [3] 
Figs. [3a] (without self-interaction, even z values) and I3bl 
(with self-interaction, odd z values) are very different from 
those observed for the DES (Fig.l2aland l2bj) . Nevertheless, 
they also confirm that proportion of cooperators evolves as 
time goes by, for T = 1. Here, the majority of the players 
cooperate in whole the parameter space with the exception 
of z — 2 and z — 4 (without self- interaction, see Fig. I3"aj) . 
where the gwsi-regular phase occurs, with poo ~ 0.5. 
Notice that poo decreases as z increases, with a stiffness 
more pronounced in the presence of self-interaction, be- 
cause defectors always receive a null payoff due his/her 
self-interaction. Also, a poa symmetry occurs, regarding 
the po — 1/2, because for po = f3, while rL players re- 
ceive a positive payoff, (1 — r)L players receives a negative 
one, whether, po = 1 — /3, (1 — r)L players receive a pos- 
itive payoff, rL players receives a negative one, where r 
is a arbitrary proportion of players which depends on the 
distribution of the players in each time step [T5]. Figs. 
l3cl and [3d] show the SD for even and odd z values, re- 
spectively. In these graphics, the small SD values show 
the non-existence of the chaotic phase. Only the quasi- 
regular (z = {2; 4}) phase has a slightly higher value. 
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Fig. l3el displays the projections of poo on the plan pooPo 
for even z values (without self- interaction) , and Fig. [3fl for 
odd ones (with self-interaction). These plots demonstrate 
that the cooperative phase is dominant. The guosi-regular 
phase occurs only for z — 2 and z = 4. The proportion 
of cooperators is below than the "expected" only in the 
gwasi-regular phase. 




(e) (f) 

Fig. 3: Surfaces of p^ and their fluctuations as a function 
of po and z, for T = 1. Players adopt the PES. Specifically 
(a) poo for even z values, with z = [2; 30] (without self- 
interaction); (b) poo for odd z values, with z = [3; 29] 
(with self- interaction) ; (c) fluctuation of poo displayed in 
(a); (d) the same for (b). Projections of poo in the plan 
poopo'- (e) blue: z = 2, green: 2 = 4, red: z = 8, cyan: 
z = 16, magenta: z = 28, yellow: z = 30, black: result if 
players did not change their states, (f) blue: z = 3, green: 
z = 5, red: z = 9, cyan: z = 19, magenta: z = 27, yellow: 
z = 29, black color: trivial case. 

The results presented here show that, for T = 1, the 
system is not static and trivial, as previously supposed, 
when players play the IPD with more than one neighbor 
(z > 1). On one hand, if players adopt the DES, cooper- 
ative, defective and chaotic phases may be present. The 
chaotic phase appears only for even z values (without self- 
interaction) . The more astonishing result is the coopera- 
tive phase (poo ~ 1), that is present for po > 0.5, without 
self-interaction and for po > with it. On the other hand, 



adopting the PES, as z increases, Poo(l; P] z ^> 1) de- 
creases with the exception z = {2; 4}, with a decrease 
more pronounced when the self- interaction is present. For 
z = {2; 4}, system presents the gwasz-regular phase, which 
we have firstly pointed out in Ref . [15] . The symmetry of 
Poo can be explained by equivalence arguments. Cooper- 
ation emerges even when cooperators and defectors have 
the same payoff in the IPD. The increase in the connec- 
tivity favors cooperation for the DES, but it decreases the 
cooperation for the PES. The initial proportion of coop- 
erators is not a relevant parameter in this problem, it has 
less influence in the results than the others parameters. 
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